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Abstract 

We present an mage representation method which is 
derived from analyzing Gaussian probability density func- 
tion fpdf j space using Lie group theory. In our proposed 
method, images are modeled by Gaussian mixture models 
(GMMs) which are adapted from a globally trained GMM 
called universal background model (UBM). Then we vec- 
torize the GMMs based on two facts: (1) components of 
image-specific GMMs are closely grouped together around 
their corresponding component of the UBM due to the char- 
acteristic of the UBM adaption procedure; (2) Gaussian 
pdfs fonn a Lie group, which is a differentiable manifold 
rather than a vector space. We map each Gaussian compo- 
nent to the tangent vector space (named Lie algebra) of Lie 
group at the manifold position of UBM. The final feature 
vector, named Lie algebrized Gaussians (LAG) is then con- 
structed by combining the Lie algebrized Gaussian compo- 
nents with mixture weights. We apply LAG features to scene 
category recognition problem and observe state-of-the-art 
performance on ISScenes benchmark. 



1. Introduction 

Image representation (feature) is one of the most impor- 
tant tasks in computer vision. Recently Gaussian mixture 
models (GMMs), which have been widely used for audio 
representation |fT3l in speech recognition community, have 
been adopted to describe images ||191ll20l . Compared with 
the popular histogram image representation, GMMs have 
some attractive advantages ( e.g. soft assignment, flexible 
to capture spatial information) and show better performance 
in many visual recognition applications |[T2l II 191 11201 ifTSl . 
One of the major problems of GMMs is that they do not 
form a vector space and can not convert to vectors triv- 
ially. Various vectorization methods for GMM representa- 
tion have been developed in speech recognition community 
|fT3l 121 mil and adopted to image classification applications 
1201 . The problem is clear: mapping elements in a space 
formed by Gaussian probability density functions (pdfs) to 
a vector space. However, none of the existing solutions take 
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Figure 1 . Illustration of LAG feature extraction procedure . Firstly, 
images are modeled by GMMs over local patch-level features. 
Each component of GMMs is represented as a point in the Lie 
group manifold formed by Gaussian p4/^s. Since components from 
different GMMs are closely grouped together, we vectorize them 
by mapping them to tangent space of Lie group. Finally, we com- 
bine vectors of each component into our final LAG feature. 



the properties of Gaussian function space into considera- 
tion. To do so, a fundamental question should be answered: 
what kind of space do Gaussian pt/fs form? Recently, Gong 
et al. theoretically point out that Gaussian pdfs are iso- 
morphic to a special kind of affine matrices which form a 
Lie group. A Lie group is a differentiable manifold which is 
different from ordinary vector spaces. The structure of the 
manifold can be analyzed using Lie group theory. There- 
fore, we can vectorize GMMs to more effective image de- 
scriptors by taking the Lie group properties of Gaussian pdf 
space into consideration. 

In this paper, we propose a novel image representation by 
investigating the problem of vectorization GMMs via ana- 
lyzing Gaussian pdf space. Figure [T] gives an overview of 
our proposed method. The procedure of feature extraction 
is summarized as the following four major steps. 



• First, images are modeled as GMMs over dense sam- 
pled patches. We employ a maximum a posteriori 
(MAP) which is used in |[T3l|[T9j|l20) to estimate the 
GMMs: We train global GMM called universal back- 
ground model (UBM) on the whole image corpus then 
adapt it to each image. Such a UBM adaptation GMM 
training approach is much more efficient and effective 
than the ordinary expectation maximization (EM) al- 
gorithm. 

• Then, we parameterize each component of the GMMs 
to a upper triangular definite affine transformation 
(UTDAT) matrix. UTDAT matrices are isomorphic 
to Gaussian pdfs. Since UTDAT matrices form a Lie 
group (which means Gaussian pdfs form a Lie group 
too), the Gaussian components are points in the Lie 
group manifold. In figure [T] the Lie group manifold is 
represented by a sphere surface and Gaussian compo- 
nents are represented by points on the surface. Be- 
cause GMMs trained by UBM-MAP have the same 
number of components as the UBM and each compo- 
nent is just a little shift from its corresponding com- 
ponent of the UBM, components are closely grouped 
together around the corresponding component of the 
UBM (represented by red points) in the manifold. 

• Next, we utilize characteristic of UBM adapted GMMs 
to vectorize their components {i.e. UTDAT matrices) 
by local mapping. To be precise, we map Gaussian 
components to the tangent space of the Lie group 
manifold at the point of corresponding component of 
the UBM. Since the Gaussian components are locally 
grouped together around the UBM, the mapping pre- 
serves the local structure of the manifold. The tangent 
spaces, which are termed as Lie algebras, are ordinary 
vector spaces. 

• Finally, we derive our combined vector formula of 
GMM by approximating the inner product of GMMs 
using a sum of product kernel of mixture weights 
and Lie algebrized components. The final vectorized 
GMM, which is termed as Lie algebrized Gaussians 
(LAGs), is an effective vector representation of the 
original image and is suitable for well known machine 
learning algorithms. 

We apply our proposed LAG feature to scene cate- 
gory recognition. Discriminant nuisance attribute projec- 
tion (NAP) ifTTI is employed to reduce the intra-class vari- 
abilities. Then a simple nearest centroid (NC) classifier 
is adopted to perform the classification task. Our method 
show better performance than state-of-the-art methods on 
ISScenes benchmark dataset. To be precise, we get 88.4% 
average accuracy. Furthermore, experimental results show 
that our Lie algebrization approach is superior to the widely 



used KullbackLeibler (KL) divergence based vectorization 
method. 

The remaining of this paper is arranged as follows. In 
section |2] we present a short review of the related work. 
Section [3] describe the technical detail of LAG. In section 
m the method for image classification using LAG feature 
is given. Experimental results on ISScenes dataset are re- 
ported in section |5] Conclusions are made and some future 
research issues are given. 

2. Related Work 

In recent years bag-of-features (BoF) image representa- 
tion has been widely investigated in visual recognition sys- 
tems. Inspired by the bag-of-word (BoW) idea in text in- 
formation retrieval, BoF treats an image as an collection 
of local feature descriptors extracted at densely sampled 
patches or sparse interest points, encodes them into discrete 
"visual words" using K-means vector quantization (VQ), 
then builds a histogram representation of these visual words 
llT4l . One major problem of BoF approach is that spatial 
order of the local descriptors is discarded. Spatial informa- 
tion is important for many visual recognition applications 
{e.g. scene categorization and object recognition). To over- 
come this problem, Lazebnik et al. propose a BoF extension 
called spatial pyramid matching (SPM) UO). In the SPM 
approach, an image is partitioned into 2' x 2' sub-images 
at different scale level / = 0, 1, 2 . . .. Then BoF histogram 
is computed for each sub-image. Finally, a vector represen- 
tation is formed by concatenating all the BoF histograms. 
Because of its remarkable improvements on several image 
classification benchmarks like Caltech- 101 B] andCaltech- 
256 ID, SPM has become a standard component in most 
image classification systems. 

On the other hand, GMMs are widely used for speech 
signal representation and have become a standard compo- 
nent in most speaker recognition systems llllH l2l lfT3l . In 
GMM based speech signal representation, low-level fea- 
tures are extracted at local audio segments, then a GMM is 
estimated on these features for each speech clip. Reynolds 
et al. ifTSI propose a novel GMM training method called 
universal background model (UBM) adaptation. UBM 
adaptation employ a maximum a posterior (MAP) approach 
instead of normally used maximum likelihood (ML) ap- 
proach, e.g. expectation maximization (EM). UBM adap- 
tion produces more discriminative GMM representations 
and is more efficient than ML estimation. Compared with 
BoF histogram representation, GMMs encode the local fea- 
tures in a continuous probability distribution using soft as- 
signment instead of hard vector quantization. Zhou et al. 
II20I adopt GMM to image representation and report supe- 
rior performance than SPM in several image classification 
applications. The problem of GMM representation is that 
GMMs do not form a vector space and can not be con- 



verted to vectors trivially. To get effective vector representa- 
tion, one should vectorize GMMs according to the structure 
properties of the space they formed. However, none of the 
existing approaches {e.g. 1111 121) take the structure proper- 
ties of GMM space into consideration. 

Feature space structure analysis is a new computer vi- 
sion topic investigated in recent years. Tuzel et al. IfTSi an- 
alyze the space structure formed by covariance matrices in 
a cascade based object detection scenario. A boosting al- 
gorithm is used to train the node classifiers on covariance 
features for the detection cascade. Since covariance matri- 
ces form a Riemannian manifold, they are mapped to tan- 
gent space at their mean point before feeding to the weak 
learner of each boosting iteration. Compared with treating 
covariance matrices as vectors trivially, significant improve- 
ment is gained by taking the Riemannian manifold property 
of covariance feature space into consideration during ma- 
chine learning. Gong et al. |6| derive a Lie group distance 
measure for Gaussian pdfs by analyzing the structure of a 
special kind of affine transformation matrix which is iso- 
morphic to Gaussian pdf. It has been found empirically that 
Lie group based Gaussian pdf distance is superior to the 
widely used Kullback-Leibler (KL) divergence f8ll(7l. 

In this paper, we derive a feature descriptor by analyz- 
ing UBM adapted GMMs using Lie group theory. Com- 
pared with covariance 0151 and Gaussian descriptor ||6], 
our proposed LAG descriptor is a kind of holistic descrip- 
tors rather than local descriptors. Compared with previous 
GMM based audio and image representation method, our 
proposed method takes the structural properties of UBM 
adapted GMMs into consideration. Experiment results on 
scene recognition prove the effectiveness of our method. 

3. Lie Algebrized Gaussians for Image Repre- 
sentation 

3.1. Image Modeling Using UBM adapted GMM 

We extract local features within densely sampled patches 
and represent an image using the probability distribution of 
its local features. Specifically, kernel descriptors ||T] are 
computed for each patch. The distribution of local fea- 
tures within an image are modeled by a GMM. Let s denote 
patch-level feature vector The pdf of s is modeled as 



K 



p(s|0) = ^^kJ^{s;iJ'k,'^k) 



(1) 



fc=i 



where K denotes number of Gaussian components. Af is 
multivariate normal pdf. ojk, fJ-k and Sfe are the weight, 
mean vector and covariance matrix of the kth component. 
For efficiency consideration, we restrict Sfe to be a diago- 
nal matrix. 8 = {ujk, fJ-k, '^k}k=i.2,....K denotes the whole 
parameter set of GMM. 



The descriptive capability of GMM increases with the 
number of Gaussian components K. Normally, hundreds 
of Gaussians are required to build an effective representa- 
tion. Compared with the number of parameters, however, 
the number of patches is small and insufficient to train a 
GMM using a conventional EM approach. Moreover, EM 
is time-consuming for GMMs with hundreds of Gaussians. 
To overcome these problems, we employ a UBM adaptation 
approach 1131 to estimate the parameters. The adaptation 
contains two steps: Firstly, a global GMM (i.e. UBM) is 
trained using patches from the training set. Then, the pa- 
rameters of each image-specific GMM are adapted from the 
UBM using a one iteration maximum a posterior approach 
as follows 



ujk = [ak7ik/T + (1 - ak)u3kh 

fik = a/cE/c(s) + (1 - ak)p-k 

al - afcEfc(s2) + (1 - ak)i&l + pi) 



A 



(2) 
(3) 
(4) 



where T is the number of patches in a specified image. 
uik, Pk and a-k are the weight, mean and standard devi- 
ation of the kth mixture of UBM. cj^, fik and ak are the 
weight, mean and standard deviation of the fcth mixture of 
image-specific GMM. The scale factor, 7, is computed over 
all adapted mixture weights to ensure they sum to unity. 
ak is the adaptation coefficient used to control the balance 
between UBM and image-specific GMM. Uk, Efe(s) and 
Efc (s^ ) are the sufficient statistics of s used to compute mix- 
ture weights, mean and covariance. 



Uk =^Pr(fc|st) 

T 



(5) 

Efe(s) = — VPr(fc|sOst (6) 

Efc(s') = — VPr(A:|s,)s? (7) 



nk 



where Pr(fc|sf) is the posterior probability that the ith 
patch belongs to the fcth Gaussian components. 



Pr{k\st)^—j^ 



ujA/'{suHk,(rk) 



J2m=l ^m.J^isutJ'm, CTm) 



(8) 



For each mixture, a data-dependent adaptation coeffi- 
cient ak is used, which is defined as 



dk 



nk 



rik + r 



(9) 



where r is a fixed control value to give penalty to mixtures 
with lower posterior probability. 

The parameters of image-specific GMM encode the dis- 
tributions of local patch-level features from a specified im- 
age, thus can be used as an effective visual representation 



3.2. Gaussian pdfs and Lie group 







Figure 2. Components of UBM adapted GMMs are closely 
grouped together. We choose 3 dimension of 3 components from 
different GMMs and plot them in 3-d Euclidean space. Each point 
represent a Gaussian component. Different color and marker indi- 
cate different component index (i.e. first, second or third compo- 
nent of a GMM). 



of that image. On one hand GMMs are continuous pdfs 
which can avoid vector quantization problems in discrete 
distribution estimation approach such as histograms. But 
on the other hand, GMMs are not vectors essentially thus 
are not suited to most well-known classifiers, especially lin- 
ear classifiers. Of course we may simply concatenate the 
parameters as vector But the structural information of the 
original GMM space are also regrettably discarded. The 
most straightforward way to use the structure information 
of GMM feature space is to identify what kind of space it 
is and then analyze it using existing theory. Although gen- 
eral GMMs are complex distributions whose space structure 
are difficult to be analyzed, we observe that UBM adapted 
GMMs have some special characteristics which can help us 
to analyze its structure. 

In figure |2] we choose three dimensions of three com- 
ponents from UBM adapted GMMs trained on ISScenes 
dataset and plot them as points in euclidean space. It can be 
observed that the components of these GMMs are closely 
grouped together around the components of UBM. Such a 
characteristic can be explained by the behavior of UBM 
adaptation procedure. In UBM adaptation, the MAP es- 
timation contains one EM-like iteration only. Moreover, 
a adaptation coefficient prevents the resultant GMM shift 
too far from the piior distribution {i.e. UBM) in order to 
avoid under-fitting. Since components of image-specific 
GMMs are closely grouped together, they have correspon- 
dence across images. Therefore, we can analyze Gaussian 
components separately then fuse the results together In the 
rest of this section, we show that Gaussian pdfs form a Lie 
group, then derive a vectorization method for GMM from 
analyzing Gaussian pdf space using Lie group theory. 



Let Xo denote a random vector which is standard mul- 
tivariate Gaussian distributed (i.e. the mean and covari- 
ance are zero vector and identity matrix respectively). Let 
Xi = Axo + /i be a resultant vector of an invertible affine 
transformation from xq. From the properties of multivariate 
Gaussian distribution, we can know that xi is also multi- 
variate Gaussian distributed. Furthermore, the mean vector 
and covariance matrix of xi are /x and AA^ respectively. 
More generally speaking, any invertible affine transforma- 
tion can produce a multivariate Gaussian distribution. Fur- 
thermore, if we restrict A to be upper triangular and definite, 
we can get an unique A given a arbitrary multivariate distri- 
bution by Cholesky decomposition, which means there is a 
bijection between Gaussian pdfs and upper triangular def- 
inite affine transformation (UTDAT). Therefore, Gaussian 
pdfs are isomorphic to UTDATs. Let M denote the matrix 
form of UTDAT which is defined as follow. 

lA 



M 







(10) 



We can analyze M instead of Gaussian pdfs. 

Invertible affine transformations form a Lie group and 
matrix multiplication is its group operator UTDAT which 
is a special case of invertible affine transformation is closed 
under matrix multiplication operation. Therefore, UTDAT 
is a subgroup of Invertible affine transformation. Since any 
subgroup of a Lie group is still a Lie group, UTDAT is a Lie 
group. In conclusion, Gaussian p4/^s form a Lie group. 

In mathematics, a Lie group is a group which is also a 
differentiable manifold, with the property that the group 
operations are compatible with the smooth structure. An 
abstract Lie group could have many isomorphic instances. 
Each of them is an representation of the abstract Lie group. 
In Lie group theory, matrix representation ifTFl is a useful 
tool for structure analysis. In our case, UTDAT is the matrix 
representation of the abstract Lie group formed by Gaussian 
pdfs. Specially, covariance matrices of our GMMs are di- 
agonal thus A is diagonal too. Precisely, UTDAT of the fcth 
component is defined as follow. 



A/fc- 
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(11) 



where pk{d) and (Jk{d) are the mean and standard devia- 
tion of the dth dimension of the fcth Gaussian component 
respectively. 

3.3. Lie Algebrization of Gaussian components 

As discussed before, components of UBM adapted 
GMM are closely grouped together around its correspond- 



ing component of UBM thus we can preserve most of their 
structure information by projecting them to the tangent 
space of Lie group at the point of the corresponding com- 
ponent. In mathematics, the tangent space of a manifold fa- 
cihtates the generaUzation of vectors from affine spaces to 
general manifolds, since in the latter case one cannot simply 
subtract two points to obtain a vector pointing from one to 
the other Analogous to a tangent plane of a sphere, a tan- 
gent space of a Lie group is a vector space. To best preserve 
the structure information of a collection of points in a man- 
ifold, the target vector space should be the tangent space at 
the mean points of the point set 1151 . In our case, UBM is 
an approximation of all the image-specific GMMs thus we 
use components of UBM as mean Gaussian /9iifs. 

Let Mk and Mk denote the fcth component (UTDAT ma- 
trix form) of an image-specific GMM and UBM. Let m^ 
denote the corresponding point in the tangent space project- 
ing from Mk- The projection is accomplished via matrix 
logarithm. 

mk=log{M-'Mk) (12) 

Note that here log is matrix logarithm rather than element- 
wise logarithm of a matrix. Since tangent space of an Lie 
group is a vector space, m^ is a vector thus we can unfold 
elements of m^ to a vector form. 

Although we can project Gaussian components using 
equation ( 112b . it is not efficient. The log operation in (fTTt 
requires Schur decomposition of M^^Mk Q, which is 
time-consuming. Fortunately, covariance matrices of Gaus- 
sian components are diagonal in our case thus we can de- 
velop a efficient scalar form of log. 

Here we derive our scalar form of UTDAT matrix loga- 
rithm. For diagonal Gaussian components, each dimension 
of transformation is independent. So we analyze the 1-d 
case of UTDAT logarithm first. Here we let M be a 1-d 
UTDAT matrix with the form 



M = 



1 



(13) 



and \&\.K = M — I where / is a 2-d identity matrix. Using 
the series form of matrix logarithm, we have 

(14) 
(15) 



m = 1 


og[M) 




= log(/ + K) 
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g(_l)n-l(£_ll 




- E(-i 

n=l 
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log (cr) /i log {(j) 






(16) 



„_i£(£-iy*J_ 



(17) 
(18) 



Note that we can always scale the matrix using the follow- 
ing equations in order to make sure the series convergent 



logA = log(A(/ + B)) 

= log(A/) + log(/ + B) 
= (logA)/ + log(i?) 



Using the above equations, we have 



log(A/f IM2) = log( 



log( 
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(19) 
(20) 
(21) 



(22) 



0-1 0-2 (yU2-Ail)o-i 







1 



) (23) 



log(ff) (A^2-/.i)i2S(^^^M£l) 


(24) 

Note that we use ( II8I 1 to derive ( 124b from (123b . Finally, 
we can get our projected Gaussian component m^ using the 
above equations. 
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(25) 

If we assume that a always equals a {i.e. adapt mean only 
and keep covariance unchanged during UBM adaptation), 
mfc is reduced to lii 



mfc 



(/Jfcl-Mfcl) 



using the fact 
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a;->0 



log(l 






1 



(26) 



(27) 



Compared with m^, rhk represents each dimension using a 
scalar (while m^ uses a 2-d vector) and discards the covari- 
ance information of GMM. 

3.4. Lie Algebrized Gaussians 

After vectorization of Gaussian components, we fuse 
them together to get a vectorized GMM. We derive our vec- 
torized GMM from a product kernel. Let a and b denote two 
GMMs, we use kernel function /(a, b) defined as follow 



f{a,b) 



K 

E 

fc=i 



/.K,c.^)/^(m^, 



(28) 



where /„ and /;, are kernel functions for mixture weights 
and vectorized Gaussian pdfs. Using linear inner prod- 
uct f^{x,y) ~ x^y for vectorized Gaussian pdf and 



fwix, y) = ^Jxy for mixture weights, we get 

K 

k=\ 



(29) 
(30) 



fc=i 



Using equation ( |30l l, we designed our final vector Viag as 

Vlag = [y/uJ^mi, y/uj^m2, . . . , y/oj]^mK] (31) 

The final vector Viag, which is named Lie algebrized Gaus- 
sians (LAG), is an effective representation of the original 
image and is suitable for most known machine learning 
techniques. If we replace uik with m^ in (ISTT i. we can get a 
reduced LAG (rLAG) vector which has lower dimensional- 
ity but less discriminative. 

4. Scene Category Recognition Using LAG 
Feature 

We apply our LAG feature to scene category recognition. 
Scene recognition is a typical and important visual recogni- 
tion problem in computer vision. Some digital cameras {e.g. 
Sony W170 and Nikon D3/300) are also starting to include 
"Intelligent Scene Recognition" modules to help selecting 
appropriate aperture, shutter speed, and white balance. 

To address the scene recognition problem, we represent 
each image using our proposed LAG vector Since SPM 
ifTOl have been proved empirically to be a useful compo- 
nent for various visual recognition system, we adopt it to 
our LAG based representation. Specifically, we divide im- 
age into sub-images in the same manner as SPM and extract 
LAG features for each sub-image, then combine these LAG 
vectors for image representation. Then we reduce within- 
class variability of LAGs using discriminant nuisance at- 
tribute projection Iil7j . For efficiency reasons, we employ 
a simple nearest centroid (NC) classifier to classify NAP 
projected LAG features into different scene categories. 

5. Experimental Results 

We test our method on the ISScenes dataset BlOl . The 
scene dataset contains fifteen scene categories, thirteen of 
them is provided by Fei-Fei et al. in [51 . Each scene cate- 
gory contains about 400 images. The size of each image is 
about 300 X 250 pixels. This dataset is the most compre- 
hensive one for scene category recognition. 

We extract kernel descriptors ||T] on densely sampled 
patches for each image. Specifically, three types of ker- 
nel descriptors are used: color, gradient and LBP kernel 
descriptors. Large images are resized to be no larger than 
300 X 300. 16 X 16 and 24 x 24 patches with 4 pixel step 



are used. The resultant kernel descriptors are reduced to 
50-d using principal component analysis (PCA). Each 50-d 
vector is then combined with the normalized x-y spatial co- 
ordinates of the center of its patch window. Therefore, the 
final patch descriptors are 52-d which contains both appear- 
ance and spatial information of the patches. We model each 
image using GMMs with 512 components. Specifically, we 
divide each image into 1x1 and 2x2 pyramid-like sub- 
images and estimate a GMM for each sub-image (5 GMMs 
for an image in total). The corresponding 5 LAG vectors 
are concatenated to a single vector To test scene recog- 
nition performance, we randomly select 100 images from 
each category for training and the rest for testing. The ex- 
periments are repeated 10 times and 88.4% average recog- 
nition accuracy is obtained. We assemble the performance 
of our algorithm and various state-of-the-art algorithms in 
table [T] for comparison. The results show that our method 



Algorithm 


Average Accuracy(%) 


Histogram [|5| 


65.2 


SPMfTOl 


81.4 


hgEqI 


85.3 


KDESH-LinSVM d 


81.9 


KDESH-LapKSVM d 


86.7 


LAGh-NC (this paper) 


88.4 



Table 1. Performance of different algorithms on 15Scenes dataset. 
Our LAG+NC method achieves the state-of-the-art performance. 

outperform all the other algorithms. Kernel descriptor with 
Laplacian kernel SVM, which is the second best, obtains 
86.7% average recognition accuracy. Note that our method 
uses a simple NC classifier rather than kernel machines. It is 
also observed that Laplacian kernel SVM boost the perfor- 
mance of kernel descriptors a lot from linear SVM (81.9%). 
So it is interesting to see what kind of kernel SVM can boost 
the performance of our LAG representation. However, our 
method is more practical because NC classifier is suitable 
for large scale dataset. 

The third best algorithm [20 1 in table [T] using a KL di- 
vergence based vectorization together with a spatial infor- 
mation scheme called Gaussian map as image represen- 
tation. KL divergence based vectorization (KLVec) is a 
most widely used GMM vectorization approach and has 
been empirically proved to be effective in many applica- 
tions 10 Us] ([19]. KLVec has the following form 
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(32) 



For a specified dimension {e.g. dth) of a specified compo- 
nent {e.g. fcth), KLVec encodes the distribution as 



^■kd 
O'kd 



(33) 



From (I33l l. we can clearly see that our LAG feature is dif- 
ferent from KLVec in two aspect: Firstly, KLVec discard 
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Figure 3. Confusion matrices of tlie three vectorization approaches: (a) KLVec, (b) reduced LAG and (c) LAG. The entry in the ith row 
and jth column is the percentage of images from the ith class and classified as the jth class. 



covariance information of GMM. Secondly, mean vector in 
LAG is centralized by subtracting the corresponding mean 
of UBM. Furthermore, the difference between RLAG and 
KLVec is mean centralization only. 

To compare KLVec with LAG and RLAG empirically, 
we implement both KLVec and test it in the same scenario 
with same parameter settings as LAG and rLAG. The av- 
erage accuracies of the three vectors are presented in table 

m 

We observe that LAG is significantly superior to KLVec 
(88.4% vs 83.8%). The reduced LAG achives 87.3% ac- 
curacy, which indicates that the centralization operation of 
LAG feature is important. The detailed confusion matrices 
of the three vectorization approaches are present in figure 
|3] Note that our system with KLVec obtain lower accuracy 
than the system in 11201 . the reason might be that some com- 
ponents {e.g. spatial Gaussian maps) is not included in our 
system. 



Algorithm 


Average Accuracy(%) 


KLVec 
rLAG (this paper) 
LAG (this paper) 


83.84 ± 1.23 
87.36 ±0.95 
88.40 ± 0.96 



Table 2. Comparison of different vectorization approach. The cen- 
trahzation operation of reduced LAG considerably improves the 
performance compared with KLVec. The covariance information 
in LAG vector is also useful for recognition, which improves an- 
other 1% from reduced LAG feature. 

We test the three vectorization approaches (LAG, rLAG, 
KLVec) with different number of Gaussian mixture compo- 
nents and keep the same setting for the other parameters 
as described above. The results are shown in table |3] Ac- 
cording to the table, LAG is always superior to rLAG and 
KLVec. Moreover, LAG gains fair performance when the 
number of Gaussian mixture components is just set to 32. 
This phenomenon demonstrates that the covariance matrix 



information which is represented by LAG is very useful. 

6. Conclusion and Future Work 

We analyze the structure of UBM adapted GMMs and 
derive a Lie group based GMM vectorization approach for 
image representation. Since Gaussian pdfs form a Lie 
group and components of UBM adapted GMMs are closely 
grouped together around UBM, we map each component 
of a GMM to tangent space (Lie algebra) of Lie group at 
the position of corresponding component of UBM. Such a 
kind of vectorization approach (named Lie algebrization) 
preserves the structure of Gaussian components in the orig- 
inal Lie group manifold. The final Lie algebrized Gaus- 
sians (LAG) features are constructed by combining Lie al- 
gebrized Gaussian components with mixture weights. We 
apply LAG to scene category recognition and achieve state- 
of-the-art performance on 15Scenes benchmark with a sim- 
ple nearest centroid classifier Experimental results also 
show that our vectorization approach is considerably supe- 
rior to the widely used KL divergence based vectorization 
method. 

There are several interesting issues about LAG based im- 
age representation we shall investigate in the future. Firstly, 
we shall apply LAG to other visual recognition problems, 
such as object recognition, action recognition. Secondly, it 
is interesting to develop a kernel classifier for GMM using 
its Lie group structure. Finally, applying LAG feature to 
audio representation and comparing it with KL divergence 
based vectorization is another interesting topic. 
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